Search results for "Computer Science - Information Retrieval"

showing 10 items of 16 documents

Distributed Real-Time Sentiment Analysis for Big Data Social Streams

2014

Big data trend has enforced the data-centric systems to have continuous fast data streams. In recent years, real-time analytics on stream data has formed into a new research field, which aims to answer queries about "what-is-happening-now" with a negligible delay. The real challenge with real-time stream data processing is that it is impossible to store instances of data, and therefore online analytical algorithms are utilized. To perform real-time analytics, pre-processing of data should be performed in a way that only a short summary of stream is stored in main memory. In addition, due to high speed of arrival, average processing time for each instance of data should be in such a way that…

Data streamFOS: Computer and information sciencesComputer Science - Computation and LanguageComputer sciencebusiness.industryData stream miningSentiment analysisBig dataMachine Learning (stat.ML)Databases (cs.DB)Data structurecomputer.software_genreField (computer science)Computer Science - Information RetrievalTree (data structure)Computer Science - DatabasesComputer Science - Distributed Parallel and Cluster ComputingAnalyticsStatistics - Machine LearningData miningDistributed Parallel and Cluster Computing (cs.DC)businesscomputerComputation and Language (cs.CL)Information Retrieval (cs.IR)
researchProduct

A Neural Turing~Machine for Conditional Transition Graph Modeling

2019

Graphs are an essential part of many machine learning problems such as analysis of parse trees, social networks, knowledge graphs, transportation systems, and molecular structures. Applying machine learning in these areas typically involves learning the graph structure and the relationship between the nodes of the graph. However, learning the graph structure is often complex, particularly when the graph is cyclic, and the transitions from one node to another are conditioned such as graphs used to represent a finite state machine. To solve this problem, we propose to extend the memory based Neural Turing Machine (NTM) with two novel additions. We allow for transitions between nodes to be inf…

FOS: Computer and information sciencesArtificial Intelligence (cs.AI)Computer Science - Artificial IntelligenceInformation Retrieval (cs.IR)Computer Science - Information Retrieval
researchProduct

Multilingual Clustering of Streaming News

2018

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …

FOS: Computer and information sciencesComputer Science - Computation and LanguageInformation retrievalComputer scienceInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technologyClusteringMedia MonitoringComputer Science - Information RetrievalComputingMethodologies_PATTERNRECOGNITIONMultilingual Methods0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCluster analysisComputation and Language (cs.CL)Information Retrieval (cs.IR)
researchProduct

Untrue.News: A New Search Engine For Fake Stories

2020

In this paper, we demonstrate Untrue News, a new search engine for fake stories. Untrue News is easy to use and offers useful features such as: a) a multi-language option combining fake stories from different countries and languages around the same subject or person; b) an user privacy protector, avoiding the filter bubble by employing a bias-free ranking scheme; and c) a collaborative platform that fosters the development of new tools for fighting disinformation. Untrue News relies on Elasticsearch, a new scalable analytic search engine based on the Lucene library that provides near real-time results. We demonstrate two key scenarios: the first related to a politician - looking how the cat…

FOS: Computer and information sciencesComputer Science - Computers and SocietyComputers and Society (cs.CY)Information Retrieval (cs.IR)Computer Science - Information Retrieval
researchProduct

Focusing Knowledge-based Graph Argument Mining via Topic Modeling

2021

Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from…

FOS: Computer and information sciencesComputer Science - Machine LearningArtificial Intelligence (cs.AI)Computer Science - Artificial IntelligenceInformation Retrieval (cs.IR)Computer Science - Information RetrievalMachine Learning (cs.LG)
researchProduct

Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities

2018

Measuring similarities between strings is central for many established and fast growing research areas including information retrieval, biology, and natural language processing. The traditional approach for string similarity measurements is to define a metric over a word space that quantifies and sums up the differences between characters in two strings. The state-of-the-art in the area has, surprisingly, not evolved much during the last few decades. The majority of the metrics are based on a simple comparison between character and character distributions without consideration for the context of the words. This paper proposes a string metric that encompasses similarities between strings bas…

FOS: Computer and information sciencesComputer Science - Machine LearningArtificial Intelligence (cs.AI)Computer Science - Computation and LanguageComputer Science - Artificial IntelligenceComputation and Language (cs.CL)Information Retrieval (cs.IR)Machine Learning (cs.LG)Computer Science - Information Retrieval
researchProduct

Open Data Quality Evaluation: A Comparative Analysis of Open Data in Latvia

2020

Nowadays open data is entering the mainstream - it is free available for every stakeholder and is often used in business decision-making. It is important to be sure data is trustable and error-free as its quality problems can lead to huge losses. The research discusses how (open) data quality could be assessed. It also covers main points which should be considered developing a data quality management solution. One specific approach is applied to several Latvian open data sets. The research provides a step-by-step open data sets analysis guide and summarizes its results. It is also shown there could exist differences in data quality depending on data supplier (centralized and decentralized d…

FOS: Computer and information sciencesGeneral Computer ScienceComputer sciencemedia_common.quotation_subjectStakeholderLatvianDatabases (cs.DB)Statistics - ApplicationsStatistics - Computationlanguage.human_languageComputer Science - Information RetrievalComputer Science - Computers and SocietyOpen dataLead (geology)Computer Science - DatabasesRisk analysis (engineering)Data qualityComputers and Society (cs.CY)languageMainstreamQuality (business)Applications (stat.AP)Information Retrieval (cs.IR)Computation (stat.CO)media_common
researchProduct

At Your Service: Coffee Beans Recommendation From a Robot Assistant

2020

With advances in the field of machine learning, precisely algorithms for recommendation systems, robot assistants are envisioned to become more present in the hospitality industry. Additionally, the COVID-19 pandemic has also highlighted the need to have more service robots in our everyday lives, to minimise the risk of human to-human transmission. One such example would be coffee shops, which have become intrinsic to our everyday lives. However, serving an excellent cup of coffee is not a trivial feat as a coffee blend typically comprises rich aromas, indulgent and unique flavours and a lingering aftertaste. Our work addresses this by proposing a computational model which recommends optima…

FOS: Computer and information sciencesService (systems architecture)business.industryComputer scienceFeature vectorSupervised learningComputer Science - Human-Computer InteractionComputingMilieux_PERSONALCOMPUTING02 engineering and technologyRecommender systemMachine learningcomputer.software_genreField (computer science)GeneralLiterature_MISCELLANEOUSComputer Science - Information RetrievalPersonalizationHuman-Computer Interaction (cs.HC)0202 electrical engineering electronic engineering information engineeringRobotUnsupervised learning020201 artificial intelligence & image processingArtificial intelligencebusinesscomputerInformation Retrieval (cs.IR)
researchProduct

Tag2Risk: Harnessing social music tags for characterizing depression risk

2020

Musical preferences have been considered a mirror of the self. In this age of Big Data, online music streaming services allow us to capture ecologically valid music listening behavior and provide a rich source of information to identify several user-specific aspects. Studies have shown musical engagement to be an indirect representation of internal states including internalized symptomatology and depression. The current study aims at unearthing patterns and trends in the individuals at risk for depression as it manifests in naturally occurring music listening behavior. Mental well-being scores, musical engagement measures, and listening histories of Last.fm users (N=541) were acquired. Soci…

FOS: Computer and information sciencesSound (cs.SD)Audio and Speech Processing (eess.AS)FOS: Electrical engineering electronic engineering information engineeringbehavioral disciplines and activitiesInformation Retrieval (cs.IR)Computer Science - MultimediaComputer Science - SoundhumanitiesMultimedia (cs.MM)Computer Science - Information RetrievalElectrical Engineering and Systems Science - Audio and Speech Processing
researchProduct

Binary jumbled string matching for highly run-length compressible texts

2012

The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time $O(n^2/\log n)$ [Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or $O(n^2/\log^2 n)$ in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of $s$. The construction time of our index i…

FOS: Computer and information sciencesString algorithmsStructure (category theory)Binary numberG.2.1Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologyString searching algorithm01 natural sciencesComputer Science - Information RetrievalTheoretical Computer ScienceCombinatoricsdata structuresSimple (abstract algebra)Computer Science - Data Structures and AlgorithmsString algorithms; jumbled pattern matching; prefix normal form; data structures0202 electrical engineering electronic engineering information engineeringParikh vectorData Structures and Algorithms (cs.DS)Run-length encodingMathematics68W32 68P05 68P20String (computer science)prefix normal formSubstringComputer Science Applicationsjumbled pattern matching010201 computation theory & mathematicsData structureSignal ProcessingRun-length encoding020201 artificial intelligence & image processingConstant (mathematics)Information Retrieval (cs.IR)Information SystemsInformation Processing Letters
researchProduct